Skip to content

Conversation

@ARR4N
Copy link
Collaborator

@ARR4N ARR4N commented Sep 26, 2025

Introduces the saexec.Executor, which manages a FIFO queue of blocks for execution.

Recommended review order:

  1. saexec/saexec.go defines the Executor and getters.
  2. saexec/execution.go implements the execution loop
  3. saexec/subscriptions.go implements notification points for eth_subscribe
  4. saexec/saexec_test.go testing of the above
  5. All other files; those necessary for all of the above

Closes #14

@ARR4N ARR4N self-assigned this Sep 26, 2025
@ARR4N ARR4N marked this pull request as ready for review September 26, 2025 13:52
@ARR4N ARR4N marked this pull request as draft November 13, 2025 16:40
@ARR4N ARR4N marked this pull request as ready for review November 17, 2025 20:19
saexec/saexec.go Outdated

snaps := e.executeScratchSpace.snaps
snaps.Disable()
snaps.Release()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to Release, geth Journals the snapshot so that on restart is can be easily loaded. I don't think we necessarily need this, as we could just Cap(root, 0) as we are never going to reorg the state.

Should we be doing one of these here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into this more closely, Cap(root, 0) seems to error if the root is the disk layer of the snap tree... How odd

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone with Cap() because it's simpler (doesn't require committing the trie root returned by Journal()) and presumably persists less due to flattening. The Cap([disk root], 0) error should really be nil because it's a no-op, so I ignore errors under those circumstances. I could also just not call it, but the code is cleaner without the nested if, and there are no side effects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(As commented below, I think I'd prefer the check prior rather than the check after)

// still leaks at shutdown. This is acceptable as we only ever have one
// [Executor], which we expect to be running for the entire life of the
// process.
goleak.IgnoreTopFunction("github.com/ava-labs/libevm/core/state/snapshot.(*diskLayer).generate"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm not sure if there is any better way to avoid this... But in production, we shouldn't see this as a leak because this only happens:

  1. If a snapshot regeneration is needed on startup.
  2. If the disk layer hasn't been written to since the snapshot has finished generation.

In all of our tests I don't think we are actually triggering any snapshot writes right now, since we don't process enough layers for the Cap(root, 128) call to actually start flushing anything to disk.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a race in the upstream implementation (still there at tip, just moved to a different file). The goleak error includes:

[Goroutine 25 in state chan receive, with github.com/ava-labs/libevm/core/state/snapshot.(*diskLayer).generate on top of the stack:
github.com/ava-labs/libevm/core/state/snapshot.(*diskLayer).generate(0x140000d0360, 0x140000dd5c0)
	/.../go/pkg/mod/github.com/ava-labs/[email protected]/core/state/snapshot/generate.go:722 +0x51c

The blocking channel receive:

dl.lock.Lock()
dl.genMarker = nil
close(dl.genPending)
dl.lock.Unlock()

// Someone will be looking for us, wait it out
abort = <-dl.genAbort // <------ L#722 referenced by goleak
abort <- nil

But snapshot.Tree.Disable() bails early (because layer.genMarker was marked as nil above) and never sends to layer.genAbort.

layer.lock.RLock()
generating := layer.genMarker != nil // <------ Is happening after the above snippet unlocks
layer.lock.RUnlock()
if !generating {
	// Generator is already aborted or finished
	break
}
// If the base layer is generating, abort it
if layer.genAbort != nil {
	abort := make(chan *generatorStats)
	layer.genAbort <- abort
	<-abort
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ARR4N ARR4N mentioned this pull request Nov 19, 2025
ARR4N and others added 3 commits November 19, 2025 17:33
Copy link
Contributor

@StephenButtolph StephenButtolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confident that the state is being managed 100% correctly (w.r.t. flushing and shutdown). But I feel like we can make than an issue to followup on after merging.

@ARR4N
Copy link
Collaborator Author

ARR4N commented Nov 19, 2025

I'm not confident that the state is being managed 100% correctly (w.r.t. flushing and shutdown). But I feel like we can make than an issue to followup on after merging.

Added issue #29

@ARR4N ARR4N merged commit f2c79b7 into main Nov 19, 2025
11 checks passed
@ARR4N ARR4N deleted the arr4n/saexec branch November 19, 2025 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

saexec package production readiness

3 participants